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ABSTRACT 

The field of artificial intelligence to which machine learning belongs. We use machine learning 
methods like K-nearest neighbor(KNN), and Linear regression algorithm to detect and diagnose 
illnesses in this work. The dataset is trained using supervised learning, Reinforcement learning 
methods in order to construct a logical mathematical model. In the context of learning models, the 
datasets are employed for purposes such as data analysis and illness diagnosis. The purpose of the 
Disease Prediction using Machine Learning (ML) system is to make predictions about diseases based 
on the symptoms reported by patients or other users. The user inputs their symptoms, and the machine 
returns the likelihood that they have a certain ailment. In machine learning, disease prognosis relies 
on disease prediction. 
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1.INTRODUCTION 

ML is a subfield of AI that enables computers to "self-learn" from data sets and gradually improve in 
performance without being explicitly programmed. Patterns in data can be spotted and used by 
machine learning algorithms to inform their own forecasting. A machine learning system may learn 
from past data, create predictive models, and then apply those models to fresh data to anticipate an 
outcome. The larger the dataset, the better the model that can be constructed, and therefore the more 
accurate the predictions of the output. 

1.1 CLASSIFICATION OF MACHINE LEARNING 

Semisupervised learning 

Regression, and prediction are just some of the techniques that benefit from this form of learning. 
When the expense of labels prevents a completely labelled training procedure, semisupervised 
learning might be a helpful alternative. 

Reinforcement learning 

The algorithm learns via trial and error what kinds of behaviour are most likely to result in positive 
outcomes by employing reinforcement learning. There are three main parts to this sort of learning: 
the agent, the environment, and the actions taken by the agent. The goal is to have the agent make 
decisions that maximise expected benefit over some time horizon. If the agent follows a sound policy, 
he or she will complete the task considerably more quickly. When using reinforcement learning, the 
objective is to figure out what course of action works best. 

1.2 DIAGNOSIS OF DISEASES 

Machine learning's potential in areas like illness diagnosis and management ensures it will play an 
increasingly important role in the healthcare industry. When used to illness diagnosis, machine 
learning techniques allow for faster decision making with fewer false positives. Several popular 
machine learning techniques are covered. The likes of cancer, diabetes, epilepsy, heart attacks, and 
other significant ailments are diagnosed with the use of these algorithms. The condition is diagnosed 
using the theoretical and mathematical framework of machine learning algorithm's accuracy, 
precision, recall, and F1 score statistics. 
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2. LITERATURE SURVEY 

Following the work of Nicolas Vivaldi, and Meijun Ye [1], we assessed the efficacy of popular 
supervised machine learning algorithms for differentiating between patients with and without a 
previous history of traumatic brain injury (TBI) and those with a stroke history and/or normal 
electroencephalogram (EEG) (Electroencephalogram). Models using a rich feature set extracted from 
the Temple EEG Corpus were developed for two-class classification of TBI patients vs normal 
subjects and for three-class classification of TBI, stroke, and normal individuals. Both two-class and 
three-class classification performed exceptionally well with (LDA) feature selection and support 
vector machine models in both Cross validation and Independent validation. When comparing TBI 
and stroke patients to healthy controls, we found that coherence and relative Power spectral density 
in the delta frequency range were decreased, whereas power in the alpha, mu, beta, and gamma ranges 
were increased. 

In a study conducted by Simona Turco, Aya Kamaya, Thodsawit Tiyarattanachai, Kambez 
Ebrahimkheil, John Eisenbrey, and Ahmed El Kaffas [2], we developed an interpretable radiomics 
technique to distinguish between malignant on contrast-enhanced ultrasonography (CEUS). Despite 
the fact that CEUS has showed promise for differential FLLs diagnosis, qualitative examination of 
contrast enhancement patterns is still the only method used in clinical assessment. Although 
quantitative analysis is essential, it is sometimes complicated by motion artefacts and the intricate 
spatiotemporal architecture of liver contrast enhancement, which consists of many, overlapping 
vascular phases learning classifiers and optimising their performance. The location of a suspected 
lesion must be entered manually. 

B. Deepa, M. Murugappan, M. G., and Mabrook S. Al-Rakhami [3], suggest a unique approach to 
classifying brain abnormalities of the complex amplitude data for the sample is encoded as fringe 
patterns in the raw digital hologram. Create a training approach that uses deep and feature-based 
machine learning models to automatically extract this data without resorting to the time-consuming 
and error-prone traditional reconstruction method. 

Thanh Minh Vo, Tan Nhat Pham, and Son Vu Truong Dao [4] employed Gray Wolf Optimization 
and Adaptive Particle Swam Optimization to develop multilayer Perceptrons in order to detect 
diabetes. 

Gazara, Muaffaq M. Nofal, Sohom Chakrabarty, and M. Mursaleen [5] suggest a sophisticated 
pseudo-reinforcement learning method that overcomes the major class asymmetrical problem in a 
constricted dataset by incorporating simulated data into the major parameter space. 

A comprehensive framework for phenotyping biologic samples was devised by Mattia Delli Priscoli, 
Lisa Miccio, Francesco Bardozzo, and others [6]. Involves fusing of computational holography and 
label-free individual unit detection in a transmit optical system. 

Using data from nationally representative surveys of people's health and demographics, Drs. Kamrul 
Hasan, Tasnim Jawad, Akhtarul Islam, Mehedi Masud, and Jehad F. Al-Amri [7] classify measles 
vaccine use and identify the factors that contribute to it using an ensemble machine learning approach. 
Several methods of missing value imputation and feature selection have been utilised to determine 
the most important characteristics for making vaccination predictions for measles. Grid search 
hyperparameter optimization was used to fine-tune the hyperparameters of many machine learning 
models, including Naive Bayes, random forest, decision tree, XGboost, and lightgbm. Using our 
suggested (BDHS) dataset, we report on the classification performance of each individual optimal 
Machine learning model and all of its ensembles. When the suggested weighted ensemble of XGboost 
and lightgbm method was modified with the same preprocessing, the results were promising enough 
to advocate their use for the measles vaccine. 

Blood pressure may be estimated from photoelectric plethysmography data, and Sumbal Maqsood, 
Shuxiang Xu, Matthew Springer, and Rami Mohawesh [8] have done a detailed examination of 
feature extraction approaches for doing so. In order to further examine the relevance of each approach 
for feature extraction, we further subdivided them into three categories. Features from the time 
domain are presented in Group A, features extracted statistically are shown in Group B, and features 
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from the frequency domain are presented in Group C. Multiple machine learning techniques were 
used in the investigation, and their results were examined from a variety of angles. Two publicly 
accessible datasets were used to show that the collection of characteristics belonging to group A was 
more trustworthy than other strategies for blood pressure measurement. 

To ensure precise readings of blood pressure, Xiaohui Chen, Shuyang Yu, Yongfang Zhang, 
Fangfang Chu, and Bin Sun [9] established a support vector machine regression model and a random 
forest regression model. By collecting photoelectric plethysmography and electrocardiogram 
readings from persons of varying ages, we can get a good approximation of their blood pressure using 
the high-quality physiological signals and the vascular elastic cavity perfect. The blood pressure 
prediction model takes into account personal features as input parameters. Prediction performance 
may be enhanced by experimenting with various parameter settings in the model. To produce reliable 
readings of blood pressure, the best model for making such predictions is chosen. Experimental 
results show that the random forest optimization model has superior performance to the support vector 
machine regression model under the same conditions, with an average absolute error of diastolic and 
systolic blood pressure of less than SmmHg, in line with the method of the mercury 
sphygmomanometer. 

In this research, we present the MaLCaDD (Machine Learning based Cardiovascular Disease 
Diagnosis) developed by Aqsa Rahim, Yawar Rasheed, Farooque Abdul Wahab Muzaffar, and 
Muhammad Waseem Anwar [10]. The framework initially corrects for any discrepancies or missing 
data (using a mean replacement method). When choosing features, the feature importance method is 
applied. For more precise forecasting, we propose using a combination of logistic regression and k- 
nearest neighbor classifiers. 

In a study conducted by Md. Rashed-Al-Mahfuz, Salem A. Alyami, Julian M. W. Quinn, Mohammad 
Ali Moni, Abedul Haque, and Akm Azad [11], we used machine learning to determine the features 
of clinical tests that would help in the early, accurate identification of chronic kidney disease (CKD). 
By taking this measure, both time and money may be saved throughout the diagnostic screening 
process. We used k-fold cross-validation to compare the efficacy of different classifiers on datasets 
enhanced by the inclusion of these carefully chosen characteristics of clinical tests. Our proposed 
machine learning methods for CKD diagnosis work particularly well with optimised datasets 
including relevant features. We looked at the features of inexpensive clinical tests including urine and 
blood analysis, as well as other clinical indicators. With the optimised and pathologically 
characterised characteristics set, the best performing predictive model for CKD diagnosis was a 
random forest (RF) classifier. 

Molecular biomarkers are discussed in this study by Kai Shi, Wei Lin, and Xing-Ming Zhao [12]. 
Molecular biomarkers are individual molecules or groups of molecules that can aid in the diagnosis 
or prognosis of a disease or ailment. As high-throughput technologies have developed, an enormous 
quantity of data on molecular 'omics' has been collected. These omics data allow for the screening of 
potential biomarkers for illnesses and disorders. Several computational methods have been designed 
to then classify the machine learning methods into supervised, un-supervised, and recommendation 
techniques. 

Inflammatory bowel diseases (IBDs), as defined by Davide Chicco and Giuseppe Jurman [13], are a 
category of illnesses characterised by persistent inflammation of the small intestine and colon. The 
two most frequent forms of IBD are Chron's disease and ulcerative colitis. Patients with inflammatory 
bowel disease are at increased risk of having an arterial event, such as a stroke or an acute coronary 
syndrome. Information on patient's risk of developing vascular disorders may be gleaned quickly and 
cheaply from their electronic medical records after they have been diagnosed with inflammatory 
bowel disease using computational data mining methods. We looked at data from 90 people with 
IBD, 30 of whom also had some form of vascular illness. We reran the analysis on a sample of 30 
patients with IBD and arterial disease after identifying the capacity to predict the arterial event and 
the most critical variables throughout the whole dataset. An arterial event and its subtype (stroke) 
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may be accurately predicted from medical records using machine learning, and the method can also 
rank the most significant clinical characteristics in the dataset. 

With Jung-Gu Choi, Inhwan Ko, and Sanghoon Han [14], we presented a machine learning based 
classification system for assessing the severity of depressive episodes from actigraphy data. We used 
a logistic regression model that included fourteen features of the circadian rhythm (activity minimum, 
amplitude, alpha, beta, acrotime, upmesor, downmesor, mesor, 

f_pseudo, interdaily stability (IS), intradaily classifier etc.,), intradaily classifier performed best in 
classifying depression levels out of the four potential classifiers. For purposes of feature extraction 
and classification, the actigraphy data of two days was ideal. 

Written by N. Garcia-D’urso, P. Climent-Pérez et al. [15], we introduced a machine learning method 
for predicting cholesterol levels from readily available and non-invasive data. Clinical and 
anthropometric information collected by dietitians during weight reduction interventions are used. 
The goal of analysing the predictive capacity of various patient factors is to boost the accuracy of 
non-invasive diagnostics and the efficiency of screening for related disorders. Different groups of 
patients that share specific traits that have been relatively hidden but may contain crucial diagnosis 
or prognosis information have been identified using a clustering study. 

Sarria to further enhance the prediction accuracy of the ensembled model, we included hybrid 
classifiers utilising the majority voting approach [16] by E. A. Ashri, M. M. El-Gayar, and Eman M. 
El-Daydamony. In order to improve prediction performance and overall time consumption, a genetic 
algorithm-based preprocessing approach and features selection is presented. 

We used the accurate classification of Cushing's syndrome developed by Senol Isci, Derya Sema 
Yaman Kalender, Firat Yaman [17], which plays an important role in providing the early and correct 
analysis of Cushing's syndrome, which may facilitate treatment and improve patient outcomes. In 
order to arrive at an accurate diagnosis of cushing's syndrome, doctors need to look at a number of 
factors all at once, including the patient's history, the results of several biochemical tests, and the 
results of medical imaging. With the goal of improving cushing's syndrome diagnosis, prognosis, and 
therapy, we apply machine learning algorithms to evaluate and categorise patient data in order to 
showcase their potential as a clinical decision support system. 

Md. Abdul Awal, Md. Shahadat Hossain, Abdullah Al-Mamun Bulbul, Mehedi Masud, S. M. Hasan 
Mahmud, and Anupam Kumar Bairagi [18] designed and optimised a machine learning-based 
approach to address this disease using inpatient facility data. The dataset's COVID and non-COVID 
classes are balanced using the proposed framework's Adaptive Synthetic (ADASYN) technique, and 
their hyperparameters are optimised using Bayesian optimization. Despite the efficiency of the 
proposed strategy. 

We propose an effective strategy for employing an artificial recurrent neural network in the 
continuous early prediction of intracranial pressure(ICP) evaluation in patients with traumatic brain 
injury in this paper by Guochang Ye, Vignesh Balasubramanian, John K-J. Li2, and Mehmet Kaya 
[19]. Following preprocessing of the ICP data, the learning model is developed for thirteen patients 
to constantly anticipate the occurrence of the ICP signal and categorise events for the following 10 
minutes. 

The Four Khan Brothers: Usama Ahmed, Muhammad Adnan Khan, Shabib Aftab, and Muhammad 
Farhan Khan, this study by Ghassan F. Issa, Raed A. T. Said, Taher M. Ghazal, and Munir Ahmad 
[20] depicts a model for diabetes prediction that makes use of a hybrid machine learning strategy. 
Conceptual framework employs two models; the support vector machine model and the artificial 
neural network model. It is the job of these models to examine the data and conclude whether or not 
a patient has diabetes. 

P. Thirumoorthy, K. S. Bhuvaneshwari, C. Kamalanathan, P. Sunita, E. Prabhu et al. [21], In this 
paper, a key agreement-based Kerberos protocol for secure M-health data transmission across 
wireless networks was presented. The processed patient data is accessible to doctors and caregivers 
via a cloud server. To preserve the secrecy and integrity of authentication, the suggested protocol is 
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utilised to access data transfer between patients, servers, and physicians. The effectiveness of the 
suggested algorithm is contrasted with that of the existing protocols. 

K.Shanmugapriya, C.N.Marimuthu, N.Sridhar, S.sameema Begam [22], the goal of this work's 
proposed anomaly detection system is to identify IoT vulnerabilities and notify an organization's 
executive or service administrations. The proposed system uses the supervised machine learning 
algorithm Random Forest (RF) and the unsupervised machine learning method K-Nearest Neighbor 
(KNN) to adjust parameters in a distributed network. As a result, this system leverages cross 
validation to create a fit and a metric score while maximising model performance without 
overfitting(CV). 

D.Vanathi, S.Prabhadevi, P.Sabarishamalathi, Mohanraj.K.P [23], to achieve private collaboration, it 
is imperative to use a distributed collaborative-based privacy-preserving approach. Critical 
components known as IDSs (Intrusion detection structures) are able to reduce threats by spotting 
malicious behaviour. The privatised situation nodes exchange facts among themselves, which is a 
significant barrier to joint study. 

Avadhesh Kumar Dixit, S Karuppusamy, Sonu Kumar, Jyothi N M [24], Dermal sensors networking 
used in all-encompassing medical technologies provide a very high volume of information knowledge 
that must be constantly managed, preserved for current analysis, and used both now and in the future. 
Digital technology is a relatively recent invention that involves the management of personal 
information of electronic devices, as well as interpretations, particularly in conjunction with the 
underlying concept of networked information (IoT). 

Dr.D. Vanathi, P. Uma, M. Parvathi and K. Shanmugapriya [25], Recommender systems have 
proliferated in recent years. To meet the incredibly diverse needs of its clients, businesses like 
Amazon and eBay have produced a huge array of goods. There are more and more options available 
to customers. As a result, in order to locate what they genuinely need during this new level of 
personalization, clients should create a model or approach from the vast amout of data offered by 
businesses. 


3. COMPARATIVE ANALYSIS 
In comparative analysis, the various machine learning algorithms, parameter analysis, tools used, 


future improvement are compared. 


Table 3.1: Comparative Analysis 


S.No | Paper title Techniques and Parameter Tools used Future Work 
Algorithms Analysis 
EEG data- | Some methods, 
driven such as 
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Machine Learning | neighbor(KNN) p 
1 . Ath and CNN, may be 
for Traumatic | and Principle 
: Independent used to enhance 
Brain Component ae 
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uy ys dimensionalit 
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classification 
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4. CONCLUSION 
The capacity of machine learning to help in the early diagnosis of disease has led to its widespread 
use in the healthcare industry. It is only after illnesses have been detected that a diagnosis may be 
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made. In this work, we examine the diagnostic process for a number of disorders, including IBD, 
IHD, and chronic kidney disease. Better judgments, the ability to spot trends and breakthroughs, and 
increased research and clinical trial efficiency are all made possible by the efficient application of 
machine learning in the healthcare industry. Machine learning has several potential applications in 
healthcare, such as improved diagnostic accuracy, better pharmaceutical suggestions, readmission 
prediction, and patient risk stratification. These forecasts are grounded in the patient's anonymized 
medical records and their displayed symptoms. 
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